# Vectara

>[Vectara](https://vectara.com/) is the trusted GenAI platform for developers. It provides a simple API to build GenAI applications
> for semantic search or RAG (Retreieval augmented generation).

**Vectara Overview:**
- `Vectara` is developer-first API platform for building trusted GenAI applications. 
- To use Vectara - first [sign up](https://vectara.com/integrations/langchain) and create an account. Then create a corpus and an API key for indexing and searching.
- You can use Vectara's [indexing API](https://docs.vectara.com/docs/indexing-apis/indexing) to add documents into Vectara's index
- You can use Vectara's [Search API](https://docs.vectara.com/docs/search-apis/search) to query Vectara's index (which also supports Hybrid search implicitly).

## Installation and Setup

To use `Vectara` with LangChain no special installation steps are required. 
To get started, [sign up](https://vectara.com/integrations/langchain) and follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create a corpus and an API key. 
Once you have these, you can provide them as arguments to the Vectara vectorstore, or you can set them as environment variables.

- export `VECTARA_CUSTOMER_ID`="your_customer_id"
- export `VECTARA_CORPUS_ID`="your_corpus_id"
- export `VECTARA_API_KEY`="your-vectara-api-key"


## Vectara as a Vector Store

There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.

To import this vectorstore:
```python
from langchain.vectorstores import Vectara
```

To create an instance of the Vectara vectorstore:
```python
vectara = Vectara(
    vectara_customer_id=customer_id, 
    vectara_corpus_id=corpus_id, 
    vectara_api_key=api_key
)
```
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.

After you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:

```python
vectara.add_texts(["to be or not to be", "that is the question"])
```


Since Vectara supports file-upload, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly as file. When using this method, the file is uploaded directly to the Vectara backend, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.

As an example:

```python
vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])
```

To query the vectorstore, you can use the `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:
```python
results = vectara.similarity_score("what is LangChain?")
```
The results are returned as a list of relevant documents, and a relevance score of each document.

In this case, we used the default retrieval parameters, but you can also specify the following additional arguments in `similarity_search` or `similarity_search_with_score`:
- `k`: number of results to return (defaults to 5)
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
- `mmr_config`: can be used to specify MMR mode in the query.
   - `is_enabled`: True or False
   - `mmr_k`: number of results to use for MMR reranking
   - `diversity_bias`: 0 = no diversity, 1 = full diversity. This is the lambda parameter in the MMR formula and is in the range 0...1

## Vectara for Retrieval Augmented Generation (RAG)

Vectara provides a full RAG pipeline, including generative summarization. 
To use this pipeline, you can specify the `summary_config` argument in `similarity_search` or `similarity_search_with_score` as follows:

- `summary_config`: can be used to request an LLM summary in RAG
   - `is_enabled`: True or False
   - `max_results`: number of results to use for summary generation
   - `response_lang`: language of the response summary, in ISO 639-2 format (e.g. 'en', 'fr', 'de', etc)

## Example Notebooks

For a more detailed examples of using Vectara, see the following examples:
* [this notebook](/docs/integrations/vectorstores/vectara.html) shows how to use Vectara as a vectorstore for semantic search
* [this notebook](/docs/integrations/providers/vectara/vectara_chat.html) shows how to build a chatbot with Langchain and Vectara
* [this notebook](/docs/integrations/providers/vectara/vectara_summary.html) shows how to use the full Vectara RAG pipeline, including generative summarization
* [this notebook](/docs/integrations/retrievers/self_query/vectara_self_query.html) shows the self-query capability with Vectara.



